Performance Analysis of Multi-armed Bandit Algorithm with Negative Autocorrelation
نویسندگان
چکیده
منابع مشابه
MULTI–ARMED BANDIT FOR PRICING Multi–Armed Bandit for Pricing
This paper is about the study of Multi–Armed Bandit (MAB) approaches for pricing applications, where a seller needs to identify the selling price for a particular kind of item that maximizes her/his profit without knowing the buyer demand. We propose modifications to the popular Upper Confidence Bound (UCB) bandit algorithm exploiting two peculiarities of pricing applications: 1) as the selling...
متن کاملOnline Multi-Armed Bandit
We introduce a novel variant of the multi-armed bandit problem, in which bandits are streamed one at a time to the player, and at each point, the player can either choose to pull the current bandit or move on to the next bandit. Once a player has moved on from a bandit, they may never visit it again, which is a crucial difference between our problem and classic multi-armed bandit problems. In t...
متن کاملThe multi-armed bandit problem with covariates
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewa...
متن کاملMulti-armed bandit problem with precedence relations
Abstract: Consider a multi-phase project management problem where the decision maker needs to deal with two issues: (a) how to allocate resources to projects within each phase, and (b) when to enter the next phase, so that the total expected reward is as large as possible. We formulate the problem as a multi-armed bandit problem with precedence relations. In Chan, Fuh and Hu (2005), a class of ...
متن کاملMulti-armed Bandit Problems with Strategic Arms
We study a strategic version of the multi-armed bandit problem, where each arm is an individual strategic agent and we, the principal, pull one arm each round. When pulled, the arm receives some private reward va and can choose an amount xa to pass on to the principal (keeping va−xa for itself). All non-pulled arms get reward 0. Each strategic arm tries to maximize its own utility over the cour...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Signal Processing
سال: 2013
ISSN: 1342-6230,1880-1013
DOI: 10.2299/jsp.17.119